Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(storage): use dynamic trait for storage interfaces #846

Closed
wants to merge 6 commits into from

Conversation

wangrunji0408
Copy link
Member

@wangrunji0408 wangrunji0408 commented Apr 16, 2024

Signed-off-by: Runji Wang [email protected]

Dynamic async traits are not so bad for performance, as long as they are not called frequently. The advantage is that they are less invasive as they do not require generic parameters everywhere, and they allow for extensible plugins in the future.

This PR uses #[async_trait] to refactor storage traits into dyn traits. It removes <S: Storage> from executors. The only performance critical part is the TxnIterator::next_batch which is called frequently in scanning. We refactor it into a BoxStream to eliminate boxing the future on each call.

Performance comparison on TPCH
tpch time
q1 +5.9%
q17 +5.6%
q18 +5.4%
q13 +4.1%
q11 +3.9%
q14 +3.9%
q7 +3.7%
q4 +3.2%
q3 +3.1%
q10 +2.2%
q9 +1.8%
q5 +1.4%
q15 +1.2%
q16 +1.2%
q19 +1.2%
q6 +0.4%
q12 -0.4%
q20 -0.6%
q22 -2.5%
q2 -2.9%
q8 -2.9%
q21 -5.4%

Signed-off-by: Runji Wang <[email protected]>
we replace the async iterator trait by a boxed stream to avoid memory allocation on each call

Signed-off-by: Runji Wang <[email protected]>
@TennyZhuang
Copy link
Collaborator

TennyZhuang commented Apr 17, 2024

Weakly -1

The current dynamic traits have too many limitations, making them seem like unfinished products.

And pass generic parameters are not really very painful.

@wangrunji0408
Copy link
Member Author

also objected by @MrCroxx

@MrCroxx
Copy link

MrCroxx commented Apr 17, 2024

also objected by @MrCroxx

The #[async_trait] allocates a boxed future on each call. The overhead can be high for memory-only path. e.g. cache hits.

@skyzh
Copy link
Member

skyzh commented Jun 16, 2024

Given that the storage part of the system runs in batches (i.e., we always fetch as many as 1024 rows if possible), I don't think using a dynamic trait would cause a large regression. However, I don't see the code is simplified significantly without static dispatch, except that the S: Storage thing gets removed. Therefore, I'm kind of in the middle of approving or turning down this pull request :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants